Finding Consistent Global Checkpoints in a Distributed Computation

نویسندگان

  • D. Manivannan
  • Robert H. B. Netzer
  • Mukesh Singhal
چکیده

Finding consistent global checkpoints of a distributed computation is important for analyzing, testing, or verifying properties of these computations. In this paper we present a theoretical foundation for nding consistent global checkpoints. Given an arbitrary set S of local checkpoints, we prove exactly which sets of other local checkpoints can be combined with S to build consistent global checkpoints, and we present an algorithm for nding all such global checkpoints. The minimal and maximal consistent global checkpoints are presented as special cases. The results are based on the notion of zigzag paths introduced by Netzer and Xu 14]. We also present a method for nding zigzag paths using the rollback-dependency graph introduced by Wang 17, 16].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Progressive Construction of Consistent Global Checkpoints

A checkpoint pattern is an abstraction of the computation performed by a distributed application. A progressive view of this abstraction is formed by a sequence of consistent global checkpoints that may have occurred in this order during the execution of the application. Considering pairs of checkpoints, we have determined that a checkpoint must be observed before another in a progressive view ...

متن کامل

Necessary and sufficient conditions for transaction-consistent global checkpoints in a distributed database system

Checkpointing and rollback recovery are well-known techniques for handling failures in distributed systems. The issues related to the design and implementation of efficient checkpointing and recovery techniques for distributed systems have been thoroughly understood. For example, the necessary and sufficient conditions for a set of checkpoints to be part of a consistent global checkpoint has be...

متن کامل

Characterization of Consistent Global Checkpoints in Large-Scale Distributed Systems

Backward error recovery is one of the most used schemes to ensure fault-tolerance in distributed systems. It consists, upon the occurrence of a failure, in restoring a distributed computation in an error-free global state from which it can be resumed to produce a correct behaviour. Checkpointing is one of the techniques to pursue the backward error recovery. As we consider large-scale distribut...

متن کامل

Cycle Prevention in Distributed Checkpointing

A useless checkpoint is a local checkpoint that cannot be part of a consistent global checkpoint Given a set of processes that take basic local checkpoints in an independent and unknown way this paper presents a predicate that directs processes to take additional local forced checkpoints in order to ensure that no local checkpoint be useless This predicate has two noteworthy properties it can b...

متن کامل

An Efficient Recovery Mechanism with Checkpointing Approach for Cluster Federation

Checkpoint and recovery protocols are commonly used in distributed applications for providing fault tolerance. A distributed system may require taking checkpoints from time to time to keep it free of arbitrary failures. In case of failure, the system will rollback to checkpoints where global consistency is preserved. Checkpointing is one of the fault-tolerant techniques to restore faults and to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Trans. Parallel Distrib. Syst.

دوره 8  شماره 

صفحات  -

تاریخ انتشار 1997